Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets
نویسندگان
چکیده
We consider the problem of inducing grapheme-to-phoneme mappings for unknown languages written in a Latin alphabet. First, we collect a data-set of 107 languages with known grapheme-phoneme relationships, along with a short text in each language. We then cast our task in the framework of supervised learning, where each known language serves as a training example, and predictions are made on unknown languages. We induce an undirected graphical model that learns phonotactic regularities, thus relating textual patterns to plausible phonemic interpretations across the entire range of languages. Our model correctly predicts grapheme-phoneme pairs with over 88% F1-measure.
منابع مشابه
Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
Recently, neural sequence-to-sequence (Seq2Seq) models have been applied to the problem of grapheme-to-phoneme (G2P) conversion. These models offer a straightforward way of modeling the conversion by jointly learning the alignment and translation of input to output tokens in an end-to-end fashion. However, until now this approach did not show improved error rates on its own compared to traditio...
متن کاملOptimal Data Set Selection: An Application to Grapheme-to-Phoneme Conversion
In this paper we introduce the task of unlabeled, optimal, data set selection. Given a large pool of unlabeled examples, our goal is to select a small subset to label, which will yield a high performance supervised model over the entire data set. Our first proposed method, based on the rank-revealing QR matrix factorization, selects a subset of words which span the entire word-space effectively...
متن کاملSolving the Phoneme Conflict in Grapheme-to-Phoneme Conversion Using a Two-Stage Neural Network-Based Approach
To achieve high quality output speech synthesis systems, data-driven grapheme-to-phoneme (G2P) conversion is usually used to generate the phonetic transcription of out-of-vocabulary (OOV) words. To improve the performance of G2P conversion, this paper deals with the problem of conflicting phonemes, where an input grapheme can, in the same context, produce many possible output phonemes at the sa...
متن کاملUniversal grapheme-based speech synthesis
Grapheme-to-phoneme conversion follows the text processing step in speech synthesis. Typically, lexicons or Letter-to-Sound rules are used to map graphemes to phonemes. However, in some languages, such resources may not be readily available. In this paper, we describe a universal front end that supports using grapheme information alone to build usable speech synthesis systems. This work takes a...
متن کاملOnline discriminative training for grapheme-to-phoneme conversion
We present an online discriminative training approach to grapheme-to-phoneme (g2p) conversion. We employ a manyto-many alignment between graphemes and phonemes, which overcomes the limitations of widely used one-to-one alignments. The discriminative structure-prediction model incorporates input segmentation, phoneme prediction, and sequence modeling in a unified dynamic programming framework. T...
متن کامل